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Abstract 

We introduce the concept of control centrality to quantify the ability of a single node to control 

, Ph 

a directed weighted network. We calculate the distribution of control centrality for several real 
J> networks and find that it is mainly determined by the network's degree distribution. We rigor- 
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determined by its layer index or topological position in the underlying hierarchical structure of 

CO 

the network. Inspired by the deep relation between control centrality and hierarchical structure 
CN 

in a general directed network, we design an efficient attack strategy against the controllability of 

•i-H malicious networks. 
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Complex networks have been at the forefront of statistical mechanics for more than a 
decade [BB]. Studies of them impact our understanding and control of a wide range of 
systems, from Internet and the power-grid to cellular and ecological networks. Despite the 
diversity of complex networks, several basic universal principles have been uncovered that 
govern their topology and evolution (3j H] . While these principles have significantly enriched 
our understanding of many networks that affect our lives, our ultimate goal is to develop 
the capability to control them [514TT]. 

According to control theory, a dynamical system is controllable if, with a suitable choice 
of inputs, it can be driven from any initial state to any desired final state in finite time [TBT 
[20] . By combining tools from control theory and network science, we proposed an efficient 
methodology to identify the minimum sets of driver nodes, whose time-dependent control can 
guide the whole network to any desired final state [T2J . Yet, this minimum driver set (MDS) 
is usually not unique, but one can often achieve multiple potential control configurations 
with the same number of driver nodes. Given that some nodes may appear in some MDSs 
but not in other, a crucial question remains unanswered: what is the role of each individual 
node in controlling a complex system? Therefore the question that we address in this paper 
pertains to the importance of a given node in maintaining a system's controllability. 

Consider a complex system described by a directed weighted network of A nodes whose 
time evolution follows the linear time-invariant dynamics 

x(t) = Ax(t) + Bu(t) (1) 

where x(t) = (xi(t),X2(t), ■ ■ ■ ,x^{t)) T G WL N captures the state of each node at time t. 
A G M> NxN is an A x A matrix describing the weighted wiring diagram of the network. The 
matrix element a^ G R gives the strength or weight that node j can affect node i. Positive 
(or negative) value of a^- means the link (j — > i) is excitatory (or inhibitory). B G M, N * M 
is an A x M input matrix (M < A) identifying the nodes that are controlled by the time 
dependent input vector u(t) = (ui(£),-u 2 (£), • • • ,UM(t)) T G M. m with M independent signals 
imposed by an outside controller. The matrix element bij G K represents the coupling 
strength between the input signal Uj(t) and node i. The system (IT]), also denoted as (A, B), 
is controllable if and only if its controllability matrix C = (B, AB, • • • , A^^B) G R NxNM 
has full rank, a criteria often called Kalman's controllability rank condition [18]. The rank of 
the controllability matrix C, denoted by rank(C), provides the dimension of the controllable 



subspace of the system (A, B) [18j[T9]. When we control node % only, B reduces to the vector 
bw with a single non-zero entry, and we denote C with C^. We can therefore use rank(C^) 
as a natural measure of node i's ability to control the system: if rank(C^) = N, then node 
i alone can control the whole system, i.e. it can drive the system between any points in the 
iV-dimensional state space in finite time. Any value of rank(C^) less than N provides the 
dimension of the subspace i can control. In particular if rank(C^) = 1, then node i can 
only control itself. 

The precise value of rank(C) is difficult to determine because in reality the system pa- 
rameters, i.e. the elements of A and B, are often not known precisely except the zeros 
that mark the absence of connections between components of the system [21] . Hence A and 
B are often considered to be structured matrices, i.e. their elements are either fixed zeros 
or independent free parameters [21]. Apparently, rank(C) varies as a function of the free 
parameters of A and B. However, it achieves the maximal value for all but an exceptional 
set of values of the free parameters which forms a proper variety with Lebesgue measure 
zero in the parameter space [2"21 |2"3"] . This maximal value is called the generic rank of the 
controllability matrix C, denoted as rank g (C), which also represents the generic dimension 
of the controllable subspace. When rank g (C) = N, the system (A, B) is structurally con- 
trollable, i.e. controllable for almost all sets of values of the free parameters of A and B 
except an exceptional set of values with zero measure [211 1221 I2U [25] . For a single node i, 
rankg(CW) captures the "power" of i in controlling the whole network, allowing us to define 
the control centrality of node i as 

C c (i) = rank g (C w ). (2) 

The calculation of rank g (C) can be mapped into a combinatorial optimization problem 
on a directed graph G(A, B) constructed as follows [23]. Connect the M input nodes 
{«!, • • • ,um} to the N state nodes {xi, • • • ,xn} in the original network according to the 
input matrix B, i.e. connect Uj to Xi if bij ^ 0, obtaining a directed graph G(A, B) with 
N + M nodes (see Fig. [Ik and b). A state node j is called accessible if there is at least one 
directed path reaching from one of the input nodes to node j. In Fig. [TJo, all state nodes 
{xi, ■ ■ ■ ,Xj} are accessible from the input node u\. A stem is a directed path starting from 
an input node, so that no nodes appear more than once in it, e.g. u\ — > x\ — > x$ — > x-j in 
Fig. [IV). Denote with G s the stem-cycle disjoint subgraph of G(A, B), such that G s consists 



of stems and cycles only, and the stems and cycles have no node in common (highlighted 
in Fig. [lb). According to Hosoe's theorem |23j. the generic dimension of the controllable 
subspace is given by 

rank g (C) = max|£(G s )| (3) 

G B eG 

with Q the set of all stem-cycle disjoint subgraphs of the accessible part of G(A, B) and 
\E(G B )\ the number of edges in the subgraph G s . For example, the subgraph highlighted 
in Fig. [lb, denoted as G™ ax , contains the largest number of edges among all possible stem- 
cycle disjoint subgraphs. Thus, C c (l) = rank g (C ( ^ 1 - ) ) = 6, which is the number of red links in 
Fig. hb. Note that rank g (C^ 1 - ) ) = 6 < iV = 7, the whole system is therefore not structurally 
controllable by controlling x\ only. Yet, the nodes covered by the G™ ax highlighted in Fig. [lb, 
e.g. {xi,X2,X3,X4,x 5 ,X7}, constitute a structurally controllable subsystem [25]. In other 
words, by controlling node x\ with a time dependent signal u\(t) we can drive the subsystem 
{x\, £2,0:3, £4 5 £5^7} from any initial state to any final state in finite time, for almost all 
sets of values of the free parameters of A and B except an exceptional set of values with 
zero measure. In general G™ ax is not unique. For example, in Fig. [lb we can get the same 
cycle x 2 — > x 3 — > x 4 — > x 2 together with a different stem U\ — > x\ — > £5 — > Xe, which yield a 
different G™ ax and thus a different structurally controllable subsystem {x l5 x 2 , x 3 , x 4 , x 5 , x 6 }. 
Both subsystems are of size six, which is exactly the generic dimension of the controllable 
subspace. Note that we can fully control each subsystem individually, yet we cannot fully 
control the whole system. 

The advantage of Eq.pl) is that maxc s6 g |-B(G)| can be calculated via linear program- 
ming [2S], providing us an efficient numerical tool to determine the control centrality and 
the structurally controllable subsystem of any node in an arbitrary complex network (see 
Supplementary Material Sec. I. A). 

We first consider the distribution of control centrality. Shown in Fig. [2] is the distribution 
of the normalized control centrality (c c (z) = C c (i)/N) for several real networks. We find 
that for the intra-organization network, P(c c ) has a sharp peak at c c = 1, suggesting that 
a high fraction of nodes can individually exert full control over the whole system (Fig. [2k). 
In contrast, for company-ownership network, P(c c ) follows an approximately exponential 
distribution (Fig. [2]i), indicating that most nodes display low control centrality. Even the 
most powerful node, with c c ~ 0.01, can control only one percent of the total dimension of the 
system's full state space. For other networks P{c c ) displays a mixed behavior, indicating the 



coexistence of a few powerful nodes with a large number of nodes that have little control over 
the system's dynamics (Fig. [2b, c). Note that under full randomization, turning a network 
into a directed Erdos-Renyi (ER) random network f2T\ 12%] with number of nodes (N) and 
number of edges (L) unchanged, the c c distribution changes dramatically. In contrast, under 
degree-preserving randomization [29, 30], which keeps the in-degree (/ci n ) and out-degree 
(kout) of each node unchanged, the c c distribution does not change significantly. This result 
suggests that P(c c ) is mainly determined by the underlying network's degree distribution 
P(hn, k ut)- This result is very useful in the following sense: P(k in , k out ) is easy to calculate 
for any complex network, while the calculation of P{c c ) requires much more computational 
efforts (both CPU time and memory space). Studying P(c c ) for model networks of prescribed 
P{k- m , kout) will give us qualitative understanding of how P(c c ) changes as we vary network 
parameters, e.g. mean degree (k). See Supplementary Material Sec. II for more details. 

To understand which topological features determine the control centrality itself, we com- 
pared the control centrality for each node in the real networks and their randomized coun- 
terparts (denoted as rand-ER and rand-Degree). The lack of correlations indicates that both 
randomization procedures eliminate the topological feature that determines the control cen- 
trality of a given node (see Supplementary Material Sec. LB). Since accessibility plays an 
important role in maintaining structural controllability [21], we conjecture that the control 
centrality of node i is correlated with the number of nodes N T (i) that can be reached from it. 
To test this conjecture, we calculated N T (i) and C c (i) for the real networks shown in Fig. |2l 
observing only a weak correlation between the two quantities (see Supplementary Material 
Sec.I.C). This lack of correlation between N r (i) and C c (i) is obvious in a directed star, in 
which a central hub (xi) points to N — 1 leaf nodes (x2, • • • ,Xn) (Fig. [1J). As the central 
hub can reach all nodes, iV r (l) = N, suggesting that it should have high control centrality. 
Yet, one can easily check that the central hub has control centrality C c (l) = 2 for any N > 2 
and there are N — 1 structurally controllable subsystems, i.e. {xi,x 2 }, • • • , {xi,Xn-i}- In 
other words, by controlling the central hub we can fully control each leaf node individually, 
but we cannot control them collectively. 

Note that in a directed star each node can be labeled with a unique layer index: the 
leaf nodes are in the first layer (bottom layer) and the central hub is in the second layer 
(top layer). In this case the control centrality of the central hub equals its layer index (see 
Fig. fit). This is not by coincidence: we can prove that for a directed network containing 



no cycles, often called a directed acyclic graph (DAG), the control centrality of any node 
equals its layer index 

C c (i) = k. (4) 

Indeed, lacking cycles, a DAG has a unique hierarchical structure, which means that each 
node can be labeled with a unique layer index (Zj), calculated using a recursive labeling 
algorithm [31]: (1) Nodes that have no outgoing links (k out = 0) are labeled with layer index 
1 (bottom layer). (2) Remove all nodes in layer 1. For the remaining graph identify again all 
nodes with /c out = and label them with layer index 2. (3) Repeat step (2) until all nodes 
are labeled. As the DAG lacks cycles, each subgraph in the set Q of the directed graph 
G(A, b^) consists of a stem only, which starts from the input node pointing to the state 
node i and ends at a state node in the bottom layer, e.g. w x — >■ xi — > x<z — > x 4 in Fig.[l]i. The 
number of edges in this stem is equal to the layer index of node i, so rankg(C^) = C c (i) = U. 
Therefore in DAG the higher a node is in the hierarchy, the higher is its ability to control 
the system. Though this result agrees with our intuition to some extent, it is surprising 
at the first glance because it indicates that in a DAG the control centrality of node i is 
only determined by its topological position in the hierarchical structure, rather than any 
other importance measures, e.g. degree or betweenness centrality. This result also partially 
explains why driver nodes tend to avoid hubs [12j. 

Despite the simplicity of Eq. Q, we cannot apply it directly to real networks, because 
most of them are not DAGs. Yet, we note that any directed network has a underlying DAG 
structure based on the strongly connected component (SCC) decomposition (see Supplemen- 
tary Material Sec.I.D Fig. S4). A subgraph of a directed network is strongly connected if 
there is a directed path from each node in the subgraph to every other node. The SCCs of a 
directed network G are its maximal strongly connected subgraphs. If we contract each SCC 
to a single supernode, the resulting graph G, called the condensation of G, is a DAG [32J. 
Since a DAG has a unique hierarchical structure, a directed network can then be assigned an 
underlying hierarchical structure. The layer index of node i can be defined to be the layer 
index of the corresponding supernode (i.e. the SCC that node i belongs to) in G. With this 
definition of /«, it is easy to show that C c (i) > U for general directed networks. Furthermore, 
for an edge (i — > j) in a general directed network, if node i is topologically "higher" than 
node j (i.e. U > lj), then C c {i) > C c (j). Since C c {i) has to be calculated via linear pro- 
gramming which is computationally more challenging than the calculation of Zj, the above 



results suggest an efficient way to calculate the lower bound of C c {i) and to compare the 
control centralities of two neighboring nodes. Note that if U > lj and there is no directed 
edge (i — >■ j) in the network, then in general one cannot conclude that C c {i) > C c (j) (see 
Supplementary Material Sec.I.D for more details). 

Our finding on the relation between control centrality and hierarchical structure inspires 
us to design an efficient attack strategy against malicious networks, aiming to affect their 
controllability. The most efficient way to damage the controllability of a network is to 
remove all input nodes {wi,w 2 , ••• ,um}, rendering the system completely uncontrollable. 
But this requires a detailed knowledge of the control configuration, i.e. the wiring diagram 
of G(A,B), which we often lack. If the network structure (A) is known, one can attempt 
a targeted attack, i.e. rank the nodes according to some centrality measure, like degree 
or control centrality, and remove the nodes with highest centralities [331 US]- Though we 
still lack systematic studies on the effect of a targeted attack on a network's controllability, 
one naively expects that this should be the most efficient strategy. But we often lack the 
knowledge of the network structure, which makes this approach unfeasible anyway. In this 
case a simple strategy would be random attack, i.e. remove a randomly chosen P fraction 
of nodes, which naturally serves as a benchmark for any other strategy. Here we propose 
instead a random upstream attack strategy: randomly choose a P fraction of nodes, and for 
each node remove one of its incoming or upstream neighbors if it has one, otherwise remove 
the node itself. A random downstream attack can be defined similarly, removing the node 
to which the chosen node points to. In undirected networks, a similar strategy has been 
proposed for efficient immunization [31] and the early detection of contagious outbreaks [35J , 
relying on the statistical trend that randomly selected neighbors have more links than the 
node itself j36t |3"7] . In directed networks we can prove that randomly selected upstream 
(or downstream) neighbors have more outgoing (or incoming) links than the node itself 
(see Supplementary Material Sec. Ill A). Thus a random upstream (or downstream) attack 
will remove more hubs and more links than the random attack does. But the real reason 
why we expect a random upstream attack to be efficient in a directed network is because 
C c (i) > C c (j) for most edges {i — > j), i.e. the control centrality of the starting node is 
usually no less than the ending node of a directed edge. In DAGs, for any edge {i — > j), 
we have strictly C c {i) > C c (j) (see Supplementary Material Sec. III. B). Thus, the upstream 
neighbor of a node is expected to play a more important or equal role in control than the 



node itself, a result deeply rooted in the nature of the control problem, rather than the hub 
status of the upstream nodes. 

To show the efficiency of the random upstream attack we compare its impact on fully 
controlled networks with several other strategies. We start from a network that is fully 
controlled (rank g (C) = N) via a minimum set of Ad driver nodes. After the attack a P 
faction of nodes are removed, denoting with rank g (C) the dimension of the controllable 
subspace of the damaged network. We calculate rank g (C) as a function of P, with P 
tuned from up to 1. Since the random attack serves as a natural benchmark, we calculate 
the difference of rank g (C) between a given strategy and the random attack, denoted as 
5 = [rank^^-^C) - rankJ andom (C')]/iV. Apparently, the more negative is 5, the more 
efficient is the strategy compared to a fully random attack. We find that for most networks 
random upstream attack results in 5 < for < P < 1, i.e. it causes more damage to the 
network's controllability than random attack (see Fig. |3b,c,d). Moreover, random upstream 
attack typically is more efficient than random downstream attack, even though in both cases 
we remove more hubs and more links than in the random attack. This is due to the fact 
that the upstream (or downstream) neighbors are usually more (or less) "powerful" than 
the node itself. 

The efficiency of the random upstream attack is even comparable to targeted attacks 
(Fig. [3]). Since the former requires only the knowledge of the network's local structure 
rather than any knowledge of the nodes' centrality measures or any other global informa- 
tion (i.e. the structure of the A matrix) while the latter rely heavily on them, this finding 
indicates the advantage of the random upstream attack. The fact that those targeted at- 
tacks do not always show significant superiority over the random attack and the random 
upstream (downstream) attack could be due to an overlap effect — two targeted nodes suc- 
cessively chosen from the rank list based on some centrality measure are likely to have larger 
overlap between their controllable subspaces than two randomly chosen nodes. Therefore, 
successively removing targeted nodes with highest centralities will not always cause the most 
damage to the network's controllability. Random attacks can avoid such overlap to some 
extent because the removed nodes are randomly chosen. This also explains why sometimes 
targeted attacks are worse than the random attack (see Fig. [3k). Notice that for the intra- 
organization network all attack strategies fail in the sense that S is either positive or very 
close to zero (Fig. [3^). This is due to the fact this network is so dense ({k) ~ 58) that 
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we have C c (i) = C c (j) = N for almost all the edges (i — > j). Consequently, both random 
upstream and downstream attacks are not efficient and the C c -targeted attack shows almost 
the same impact as the random attack. This result suggests that when the network becomes 
very dense its controllability becomes extremely robust against all kinds of attacks, consis- 
tent with our previous result on the core percolation and the control robustness against link 
removal [12]. We also tested those attack strategies on model networks (see Supplementary 
Material Sec. III. C). The results are qualitatively consistent with what we observed in real 
networks. 

In sum, we study the control centrality of single node in complex networks and find that 
it is related to the underlying hierarchical structure of networks. The presented results help 
us better understand the controllability of complex networks and design an efficient attack 
strategy against network control. This work was supported by the Network Science Collab- 
orative Technology Alliance sponsored by the US Army Research Laboratory under Agree- 
ment Number W911NF-09-2-0053; the Office of Naval Research under Agreement Number 
N000141010968; the Defense Threat Reduction Agency awards WMD BRBAA07-J-2-0035 
and BRBAA08-Per4-C-2-0033; and the James S. McDonnell Foundation 21st Century Ini- 
tiative in Studying Complex Systems. 
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CI Original network of N nodes 

(A/=7) 



D Controlled network of (A/+/M) nodes 
(A/=7,/M=1) 

^ input node 




FIG. 1: Control centrality. (a) A simple network of N = 7 nodes, (b) The controlled network 
is represented by a directed graph G(A, B) with an input node u\ connecting to a state node 
x\. The stem-cycle disjoint subgraph G s (shown in red) contains six edges, which is the largest 
number of edges among all possible stem-cycle disjoint subgraphs of the directed graph G(A, B) 
and corresponds to the generic dimension of controllable subspace by controlling node x\. The 
control centrality of node 1 is thus C c (l) = 6. (c) The control centrality of the central hub in a 
directed star is always 2 for any network size N > 2. (d) The control centrality of a node in a 
directed acyclic graph (DAG) equals its layer index. In applying Hosoe's theorem, if not all state 
nodes are accessible, we just need to consider the accessible part (highlighted in green) of the input 
node(s). 
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FIG. 2: Distribution of normalized control centrality of several real-world networks 
(blue) and their randomized counterparts: rand-ER (red), rand-Degree (green), (a) 

Intra-organizational network of a manufacturing company [38]. (b) Hyperlinks between weblogs 
on US politics [39]. (c) Email network in a university [10]. (d) Ownership network of US corpora- 
tions EH. 
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FIG. 3: The impact of different attack strategies on network controllability with 
respective to the random attack. 5 = [rank^ trategy - J ' (C) - rank^ andom (C')]/A r with 
rankg atcgy-J (C / ) represents the generic dimension of controllable subspace after removing a P 
fraction of nodes using strategy-j. The nodes are removed according to six different strategies. 
(Strategy-0) Random attack: randomly remove P fraction of nodes. (Strategy- 1 or 2) Random 
upstream (or downstream) attack: randomly choose P fraction of nodes, randomly remove one of 
their upstream neighbors (or downstream neighbors). The results are averaged over 10 random 
choices of P fraction of nodes with error bars defined as s.e.m. Lines are only a guide to the eye. 
(Strategy-3,4, or 5) Targeted attacks: remove the top P fraction of nodes according to their control 
centralities (or in-degrees or out-degrees). 
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